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Abstract 

In  this  paper,  the  problem  of  clustering  observations  into 
homogeneous  groups  based  on  given  characteristics  of  the  observations  is 
analyzed.  Three  distinct  integer  programming  formulations  covering 
important  variations  of  the  clustering  problem  are  developed.  These 
variations  include  finding  natural  clusters,  constraining  the  number  of 
clusters  and  restricting  the  size  of  clusters.  Efficient  heuristic  techniques 
employing  Lagrangian  and  eigenvector  based  methods  are  developed  to 
solve  these  problems. 
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1.        INTRODUCTION 

Classification  has  a  rich  history,  but  numerical  methods  used  for  the 
purpose  of  classification  are  fairly  recent.  The  major  developments  have 
occurred  in  the  last  two  decades.  Sokal  and  Sneath  (1963)  published  one  of 
the  first  books  on  this  subject. 

Classification  (or  typology)  is  concerned  with  the  identification  of  an 
observation  and  its  placement  into  a  homogeneous  group  based  on  some 
characteristics.  The  pursuit  of  classification  can  be  seen  in  all  fields.  For 
example,  in  judical  science,  the  Supreme  Court  judges  may  be  grouped  on  the 
basis  of  their  legal  opinions  on  a  sample  of  cases.  In  psychology  and 
consumer  behaviour,  people  may  be  clasified  according  to  their  personality 
and  taste  characteristics.  In  international  marketing,  the  world  markets  can 
be  classified  into  segments  based  on  cultural,  socio-economical  and  political 
characteristics.  In  strategic  management,  firms  in  industries  are  classifed 
according  to  the  production,  financial  and  marketing  strategies  used.  In 
engineering  design,  parts  produced  are  classified  according  to  the 
geometrical,  tolerance  and  machining  characteristics  they  possess. 
Classification  and  cluster  analysis  has  been  applied  in  the  following  areas: 

biology  (Everitt,  1980),  data  reorganization  (McCormick  et  al.,  1972), 
medicine  (Klastorin,  1982).  pattern  recognition  (Tou  and  Gonzalez. 
1974),  part  selection  in  automated  systems  (Kusiak.  1985a). 
production  flow  analysis  (King.  1980).  race  mixture  study  (Rao.  1977). 
task  selection  (Nagai  et  al.,  1980).  control  engineering  (Siljak,  1984). 
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In  the  cases,  where  it  is  possible  to  specify  groups  a  priori,  statistical 
techniques  such  as  multiple  discriminant  analysis  provide  an  analytical 
method  to  define  topology  functions  (Green,  1978).  But  when  it  is  not 
possible  to  specify  these  groups,  one  needs  to  resort  to  various  combinatorial 
algorithms  and  heuristics  to  aid  in  constructing  the  clusters. 

An  assumption  underlying  the  use  of  clustering  techniques  is  that 
homogeneous  clusters  actually  exist  in  the  data.  The  basic  problem  in  cluster 
analysis  is  to  devise  algorithms  and  heuristics  that  group  entities  into 
clusters  based  on  observed  attributes.  The  development  of  these  heuristics 
and  algorithms  have  typically  depended  on  conceptual  representation  of  the 
process  of  clustering.  These  representations  have  been  largely  visual  and 
can  be  of  two  distinct  types,  matrix  representations  and  graph 
representations. 

Matrix  representations  have  usually  been  used  in  the  domain  of  social 
sciences.  One  of  the  first  applications  in  marketing  segmentation  and 
selection  was  by  Green  et  al.  (1967)  who  desired  to  match  representative 
test  market  with  larger  product  markets.  Here,  a  variety  of  market 
characteristics  were  gathered  for  a  number  of  potential  test  markets  and 
arranged  in  a  matrix-type  representation  with  rows  representing  cities  and 
columns,  the  market  characteristics.  The  object  was  to  rearrange  all  those 
rows,  which  were  "similar",  such  that  they  were  adjacent  in  the  permuted 
matrix.  As  is  often  the  case,  the  market  characteristics  were  measured  in 
different  scales,  and  therefore,  had  to  be  normalised  (re-scaled  to  have  a 
mean  of  0  and  a  standard  deviation  of  1)  before  similarity  measurements 
using  weighted  Euclidean  distances  were  used. 


-3- 

Another  application  of  the  matrix  representation  is  in  the  area  or 
group  technology,  which  concerns  itself  with  grouping  machines  (and 
consequently,  the  parts  that  can  be  produced  on  the  machines)  so  as  to  form 
independent  manufacturing  cells  (Burbidge,  1975  and  King.  1980).  In  this 
application,  rows  represent  the  machines  and  columns  represent  the  parts 
produced.  The  matrix  entries  are  binary,  1  representing  the  use  of  the 
machine  for  the  part  and  0  otherwise.  The  object  is  to  permute  the  rows  and 
columns  so  as  to  obtain  a  block  diagonal  representation  of  the  original 
matrix,  with  each  block  representing  a  cluster. 

Graph  representations  have  usually  been  used  in  the  engineering 
sciences  field,  particularly  electrical  engineering.  One  application  arises  in 
the  design  and  monitoring  of  power  system  operations  (Stagg  et  al.,  1970 
and  Bills.  1970).  Here,  a  weighted  graph  representation  is  used  to  depict  the 
network  of  power  grid  buses,  with  the  nodes  representing  the  machines, 
such  as  transformers,  and  the  edges  (or  arcs)  representing  the  interlinking 
connections  between  these  buses.  The  admittance  between  these  buses  is 
taken  as  the  weight  on  (or  capacity  of)  the  edges.  The  object  is  to  decompose 
the  graph  into  sub-graphs  (by  deleting  edges)  such  that  there  are  minimal 
interconnections  between  the  sub-graphs  (and  hence  maximal  connections 
within  the  sub-graphs). 
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Another  application  arises  in  the  design  of  very  large  scale  integration 
(VLSI)  circuits  (Kernighan  and  Lin.  1970).  The  circuits  are  represented  as 
graphs  with  the  electrical  elements,  such  as  resistors,  being  the  nodes  of  the 
graph  and  the  wiring  between  these  elements  representing  the  edges.  The 
purpose  of  this  representation  is  to  find  a  way  to  partition  this  graph  so  as  to 
maximize  the  number  of  circuits  that  can  be  packed  into  the  chips. 

In  this  paper,  we  describe  three,  distinct  interger  programming 
formulations  which  cover  important  variations  of  these  two  representations. 
We  characterize  the  integer  programming  formulations  by  two  constraints: 

( 1 )  fixed  number  of  clusters 

(2)  restriction  on  the  number  of  elements  within  each  cluster. 

The  three  integer  programming  formulations  presented  allow  one  to 
deal  with  these  two  constraints.  The  first  formulation  (PI)  does  not 
incorporate  any  of  these  constraints:  that  is,  we  allow  the  algorithm  to 
generate  natural  clusters.  Since  many  clusters  could  be  generated  by  the 
first  formulation,  a  second  formulation  (P2)  is  developed  which  restricts  the 
number  of  clusters.  Finally,  we  consider  a  model  which  allows  one  to  deal 
with  restrictions  on  the  number  of  clusters  and  cluster  size. 

The  paper  is  divided  into  five  sections.  In  Section  2,  we  discuss  a 
clustering  problem  with  no  restrictions  on  the  number  of  clusters  and  cluster 
sizes.  A  clustering  problem  with  a  fixed  number  of  clusters  is  presented  in 
Section  3.  A  Lagrangian  relaxation  approach  is  used  to  solve  this  problem. 
In  Section  4,  we  formulate  and  solve  a  clustering  problem  with  a  fixed 
number  of  clusters  and  cluster  sizes.  An  eigenvector  based  approach  is  used 
in  the  subsequent  analysis.  Conclusions  are  presented  in  Section  5. 
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Z.        A  CLUSTERING  PROBLEM  WITHOUT  ANT  CONSTRAINTS 

2.1  Prob  le  m  For  m  ulation 

Typically,  one  first  formulates  a  clustering  problem,  where  there  is  no 
a  priori  information  regarding  the  number  of  clusters  and  cluster  sizes.  In 
this  case  the  resulting  clusters  are  usually  generated  by  visual  inspection. 

Before  formulating  a  clustering  problem  that  does  not  restrict  the 
number  of  clusters  and  cluster  sizes,  let  us  consider  a  0-1  matrii  A  =[ajj]mxn. 
For  any  two  row  vectors  ai=lai|,...,aik ain]  and  a^la^ ajk,...,ajn]  of  matrix  A, 

define  a  distance 

n  (la) 

<V  2     S(aik,  ajk) 


k-1 


where 


5(aik,  ajk)- 


1     ifaik-ajk-l 

(lb) 
0     otherwise 
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In  this  clustering  problem,  we  attempt  to  permute  rows  and  columns 
of  matrix  A  to  maximize  the  sum  of  the  distances  djj(djj)  between  any  two 

adjacent  rows  (columns),  respectively.  It  can  be  formulated  as  follows: 

m-1      m  n-1      n 

I       I     dy     I       I 
i-1      j-i+1  i=l     j-i+1 


(PI)  maiD=     I        I      dq+     I        I      dq  (2) 


for  all  n!m!  possible  matrices  obtained  permuting  rows  and  columns  of  the 
initial  matrix  A. 

Lenstra  (1974)  has  shown  that  problem  (PI)  is  equivalent  to  two 
travelling  salesman  problems.  Based  on  this  fact  the  following  two 
conclusions  can  be  drawn: 

(1)  this  clustering  problem  is  an  NP-complete  problem 

(2)  a  travelling  salesman  algorithm  can  be  applied  to  solve  the  clustering 
problem. 

2.2    Algorithms  for  solving  problem  (PI ) 

To  date  a  large  number  of  algorithms  for  solving  problem  (PI)  have 
been  developed  by  researchers  working  in  many  different  areas.  Some  of 
the  must  efficient  heuristic  algorithms  have  been  discussed  in  Kusiak  (1985), 
namely: 

(1)  McCormicketal.  (1972) 

(2)  Bhat and  Haupt (1976) 

(3)  King  (1980,  1982) 

(4)  rank  energy  (Kusiak,  1985). 
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All  of  these  algorithms  are  based  on  rearranging  rows  and  columns  of  matrix 
A  to  produce  some  visible  clusters.  The  difference  between  them  is  in  the 
way  this  rearrangement  is  performed. 

Computational  complexity  of  each  of  these  algorithms  is  shown  in 
Table  1. 

Table  1.  Computational  Complexities  of  Clustering  Algorithms 

McCormick      Bhat  and  Haupt        King  (1982)       Rank  Energy 
etal(1972)  (1976) 

0jyj(nm2+n2m)     Ogdn^n2)  OK(mnlogmn)    0n(m+n)2 

One  can  notice  that  the  following  inequality  holds  Og  <  0R  <  0K  <  0M. 

3.         A  CLUSTERING  PROBLEM  WITH  FIXED  NUMBER  OF  CLUSTERS 

3.1  Problem  Formulation 

In  order  to  formulate  this  problem  let  us  introduce  the  following 
notation: 

n      number  of  elements 

m      required  number  of  clusters 

djj      distance  from  element  i  to  element  j(djj  >•  0),  Vi*  j-1 n  and 

dir0.  Vi-j-1 n). 


xij=     < 


1  if  Ith  element  belongs  to  Ith  cluster 
0  otherwise 
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The  objective  function  minimizes  the  total  sum  of  distances  in  a  cluster  to 

the  cluster  median: 

n  n 

Zx  =  minZ(x)  =  I  I      d^  (3) 

i-1  j-l 

n 
s.t.    Z    iij-1   ¥i-l,...,n  (4) 


j-l 


(P2) 


n 

I   ijj-m  (5) 

H 

Xij^ijj  Vi-i,...,n         Vj-1 n  (6) 

xM  =  0,1         Vi-1 n         ¥j-l,...,n  (7) 

Constraint  (4)  ensures  that  each  element  belongs  to  exactly  one  cluster. 
Constraint  (5)  specifies  a  required  number  of  clusters.  Constraint  (6) 
ensures  that  a  cluster  j  is  formed  when  a  corresponding  element  is  a 
median.  The  last  constraint  (7)  imposes  integrality. 

3  2  A  Subgradient  Algorithm 

Problem  (P2)  has  been  solved  by  Mulvey  and  Crowder  (1979)  but  a 
more  efficient  subgradient  algorithm  is  presented  here.  The  main 
difference  between  the  proposed  algorithm  and  that  of  Mulvey  and 
Crowder  (1979)  is  in  the  procedure  of  computing  lower  bounds.  The 
algorithm  of  Mulvey  and  Crowder  (1979)  computes  the  lower  bounds  based 
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upon  a  heuristic  algorithm  developed  by  Ward  (1963).     The  presented 
subgradient  algorithm  is  based  on  a  simple  procedure  of  computing  lower 
bounds  shown  in  Arthanari  and  Dodge  (1981). 

Dualizing  on  constraint  (4)  the  objective  function  (3)  is  transformed 
as  follows  (for  Uj  >/  0,  Vi-l,...,n) 

n       n  n  n 

Z„-  minZ(ui)»   Z        I     d^  +   I  UjO-I    x^)  (8) 

i«l     j-1  i-1  j-1 

Reordering  (8)  the  following  relaxed  problem  is  obtained 

n        n  n 

Zu-  minZ(Ui)-  I        I    (d^  -  u^x^  +  I  Ui  (9) 

i-1      j-1  i-1 

(Pu) 

s.t.  (5),  (6)  and  (7). 

The  best  choice  of  u  is  an  optimal  solution  to  the  dual  problem 


ZD  =  max  Z,  (10) 

u 
(D) 


s.t.  (5),  (6)  and  (7). 
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Framework  of  the  Subgradient  Algorithm 

In  the  subgradient  algorithm  one  specifies  initial  values  of  Lagrangian 
multipliers  u°t  and  in  each  iteration  k+1    an  updated  sequence  ut  k+1     is 

generated  as  follows: 

u.k+i-u.k  +  akg.k  (11) 

where:  ak  is  a  positive  scalar  step  size 

gik  is  a  subgrdient;  in  the  case  of  the  problem  (P2) 

gi-l-I  x*ij,  (12) 

J 

where  x*^  is  an  optimal  solution  to  the  problem  (P„) 

The  most  commonly  used  step  size  is 

ak.  yk  (uBk  -  zku),  (13) 

llg'll 

where:  Yk  is  a  scalar  statisfying  0<Yk  <2  (see  Motzkin,  1954) 
UBk  is  an  upper  bound  on  ZD 

II  •  II  is  an  Euclidean  norm. 

To  compute  UBk  in  our  subgradient  algorithm  a  simple  heuristic,  generating  a 
feasible  solution  to  the  problem  (P2)  is  used. 

In  order  to  solve  the  dual  problem  (D)  the  following  general 
framework  of  a  subgradient  algorithm  is  applied: 
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Step  0.  Set  iteration  number  k=0  and  choose  initial  values  of  Lagrangian 

multipliers  ukj,  i=l n. 

Step  1.  Solve  problem  (P0)  for  all  ukj.  The  value  obtained  ZB  is  a  lower 

bound  on  the  value  of  the  objective  function  Zp  in  (D). 
Step  2.  Generate  a  feasible  solution  to  problem  (P2).  The  value  Zx  is  an 

upper  bound  on  the  value  of  the  objective  function  Zq  in  (D). 
Step  3.  If  the  current  solution  to  the  problem  (D)  satisfies  a  given  stopping 
criterion,  stop;  otherwise  go  to  Step  1. 
Lower  Bounds  Procedure 

A  procedure  for  computing  the  lower  bounds  given  in  Arthanari  and 
Dodge  (1981)  will  be  applied.  Let  us  denote: 


sjj  -  min  (djj  -  Uj.O) 


(14) 


and  let   Sj  -  I  Sjj 
i=l 


To  minimize  (10)  let  us  arrange  the  first  m  values  of  Sj  in  an 
increasing  order  Sj(u  s<  Sj(2)v< ...  *  S)(B)  and  let  the  set  (j(  1 ),  j(2),...j(m))  =  L. 
The  optimal  solution  to  the  problem  (Pu)  is  then 


*v 


1    ifi»j€L 
0    otherwise 


(15) 


and 


»v 


1     if  i  n  jeL 
0    otherwise 


(16) 
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Substituting  x*jj  of  (15)  and  (16)  into  (9)  a  lower  bound  for  the  problem 

(D)  is  obtained. 

Upper  Bounds  Procedure 

A  feasible  solution  to  the  problem  (P2)  can  be  computed  in  the  way 
shown  in  Arthanari  and  Dodge  ( 198 1 ),  namely: 


xii  = 


and 


xij" 


1    ifi-jeL 

0  otherwise 

1  if  i*j  and  d^  -  min  d„. 

reL 

0    otherwise 


(17) 


(18) 


One  can  easily  see  the  above  solutions  satisfy  all  constraints  of 
problem  (P2). 

Substituting  all  i^  to  ( 1 )  an  upper  bound  to  the  problem  (D)  is 

obtained. 

Suberadient  Algorithm 

.  The  algorithm  for  solving  the  problem  (D)  is  as  follows: 
Step  0.    Set  k-1,  Uj  >,0,  Z{>  0,  C2  >  0.  *°  >  0.  UB°  -  +«»,  LB0  -  -~  where: 
u,°  initial  value  of  the  Lagrangian  multipliers 
Cj,  C2 .  precision  values 
Y°    initial  value  on  the  scalar  (0<Y°<2) 
UB°  initial  upper  bound  on  (10) 
LB0  initial  lower  bound  on  (10) 
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Step  1.         Compute  a  feasible  solution  for  (P2)from  (17)  and  (18)  in  order 
to  obtain  a  value  Z,k  of  (3). 

Compute  an  upper  bound  on  (9) 
UBk-mindJBk-U,*}. 

Step  2.  Compute  the  values  of  i'jj  from  (15)  and  (16)  and  substitute 

into  (9)  to  obtain  a  value  Z*k  for  updated  values  of  ujk,  i=l,...,n. 
Compute  a  lower  bound  on  (Zq) 

LBk  =  mai(LBk-i,Z.k). 
If  z»k  <  LBk,  then  reduce  Yk. 
If  Yk<  Z[ ,  stop;  otherwise  continue. 
If  (UBk  -  LBk)/UBk  <  Z2,  stop;  otherwise  go  to  step  3. 

Step  3.         Compute  the  following: 

(a)  subgradients  gjk  at  x*^ 

n 
gik  -  1  -  I    X'ij 
H 

(b)  step  size 

ak  -  p(UBk  -  zf) 
llg.MI 

(c)  updated  values  of  Lagrangian  multipliers 

u.k*l  ,  u.k  + 

!lgjkll 
Set  k  =  k  +  1  and  go  to  Step  1. 
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3.3      Computational  Results 

The  subgradient  algorithm  described  has  been  applied  to  solve  a 
number  of  problems.  For  each  problem  the  distances  djj  were  generated  by 

a  uniform,  continuous  random  number  generator.  Different  values  of  initial 
parameters  Uj°  and  Y°  have  been  tested.  The  algorithm  performed  well  for 

Uj0  =  1.1  mai  (djj)  and  Y°  =  0.75  which  were  determined  experimentally. 
J 

Tables  2  and  3  show  the  number  of  iterations  and  CPU  time  (in 
seconds)  for  20  different  problems  with  the  precision  value   Cj  -  5%  and   £2 

0.1%  respectively. 

Table  2.  CPU  time  and  number  of  iterations  for  problems  solved  with 

UB-LB 

the  precision  value  Z2  -    100%  «  5% 

UB 


m\n 

10 

20 

30 

40 

50 

60 

70 

80 

90 

100 

5 

5 
0.39 

5 
1.04 

4 
2.05 

5 
331 

5 
4.88 

5 
6.65 

5 

8.86 

5 
11.46 

5 
14.40 

5 
17.11 

10 

5 
023 

5 

0.77 

4 
1.26 

4 
2  15 

4 
330 

4 

4.65 

4 

6.31 

4 
8  22 

4 
1028 

4 
12.52 

-  15- 

Table  3.  CPU  time  and  number  of  iterations  for  problems  solved  with 

UB-LB 
the  precision  value  £2  -  1 00%  «  0. 1  % 

UB 


m\n 

10 

20 

30 

40 

50 

60 

70 

80 

90 

100 

5 

8 
052 

8 
1.69 

8 
328 

8 
534 

8 

7.94 

8 
10.87 

8 
1436 

8 
18.66 

8 
2317 

13 
4551 

10 

8 
0.38 

7 
1.08 

7 
2.28 

7 
390 

7 
592 

7 
8.34 

7 
10.81 

7 
14.68 

7 
18.50 

7 
22.28 

As  one  can  see  in  Tables  2  and  3  the  algorithm  requires  a  small  number  of 
iterations  to  generate  a  good  quality  feasible  solution  or  in  many  cases  the 
optimal  solution. 

To  show  the  efficiency  of  this  algorithm  we  have  solved  five  sets  of 
differnt  problems  by  this  algorithm  and  compared  results  obtained  with 
ones  presented  by  Mulvey  and  Crowder  (1979).  Table  4  illustrates  this 
comparison. 
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Table  4.        Comparison  of  the  Subradient  Algorithm  to  the  Algorithm 
of  Mulvey  and  Crowder  (1979) 


Problem        Number  of 

Number        Attributes  Number  of  Iterations  f  or  m  =  5 

n  Mulvey  and  Crowder  (9179)  Proposed 

Algorithm  Algorithm 

1  25  26  6 

2  50  74  7 

3  70  22  7 

4  80  82  7 

5  100  25  7 


All  the  above  computations  were  performed  on  a  CDC  CYBER  170-720 
computer.  The  algorithm  presented  requires  on  average  much  smaller 
number  of  iterations  than  the  Mulvey  and  Crowder  (1979)  algorithm  to  solve 
a  problem  of  the  same  size. 
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4.        A  CLUSTERING  PROBLEM  WITH  FIXED  NUMBER  OF  CLUSTERS 
AND  CLUSTER  SIZES 

4.1.  Problem  Formulation 

The  clustering  problem  formulations  described  in  the  last  two  sections 
may  not  necessarily  generate  desirable  sized  clusters.  Very  large  clusters  or 
a  large  number  of  very  small  clusters  may  be  a  consequence  of  these 
clustering  algorithms.  In  this  section,  we  formulate  an  eigenvector  based 
approach  which  allows  a  filed  number  of  clusters  of  filed  size  to  be 
generated.  We  begin  by  introducing  the  following  two  definitions.  Consider 
an  undirected  graph  G  -  (V,E)  where  djj  is  a  distance  measure  between 
elements  v{  and  Vj. 

Definition  1.  A  k-cluster  of  G(V,E)  is  obtained  by  deleting  the  edges 

of  G  to  obtain  k  disconnected  subgraphs  Gj  =  (Vj.Ej),  i=l,2,...,k 

k 
and      u    Vj  =  V. 

i=l 

Definition  2.    An  optimal  k-cluster  is  a  k-cluster  which  maximizes 
the  sum  of  the  intra-cluster  distance  of  the  k  clusters. 
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The  optimal  k-clustering  problem  is  a  generalization  of  the  k- means 
problem  (Hartigan,  1975).  The  main  difference  is  that  we  impose  a  limit  on 
the  cluster  size.  We  formulate  the  optimal  k-clustering  problem  as  a  0-1 
quadratic  programming  (0-1 QP)  problem.  Since  n  elements  are  to  be  divided 
among  k  clusters,  we  assign  to  each  element  i  the  variable  iiit  xi2 ijk 

where 

1     if  element  i  is  assigned  to  cluster  j 


xii- 


■ 


0     otherwise 

Each  element  i  is  in  exactly  one  cluster,  thus 

k 

Z     xirl,      Vi-l,2,...,n  (19) 

H 

Cluster  j  has  exactly  nij  elements  in  it.  Therefore,  we  add  the  following  set 
of  constraints  to  ( 1 9) 

n 

I  Xjj-  mj,     Vj-  1,2 k 

i-1 
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Since  each  edge  in  cluster  /  (/  =  1.2....k)  is  represented  by  the  node 
product  ij/ij/    Note  that  the  edge  joining  element    i    to  element    j     is 

included  in  cluster  /  if  and  only  if  i{J  -  ij/  -  1. 

If  djj  is  the  distance  between  elements  i  and  j,  then  the  total 
distance  of  all  distances  in  all  k  clusters  is  given  by 

k      n-1     n 
III       dqx^jy 
M     i-1    j-i+1 


The  0-1  QP  problem  formulation  of  the  optimal  k-clustering  problem  is: 

k     n-1      n 
min       I     I      I      djjij/j/  (20) 

/-l    i-1     j-i+1 


k 

s.t.  I    ijj  -  1,     V  i  -  1,2 n  (21) 

(P3)  j»l 

n 

I    ijj-  mjf  ¥  j-  1.2 k  (22) 

i=l 


x^  0  or  1,    ¥  i=l,2,...,n  (23) 

¥  j-1,2 k 


-20- 
4.2  An  Approximation  Algorithm  for  Solving  Problem  (P3) 

An  eigenvector  based  approach  is  described  for  finding  an 
approximate  solution  to  problem  (P3).  This  eigenanalysis  approach  is  a 
simple  extension  of  an  approach  used  by  Barnes  (1982)  and  Vannelli  (1984) 
to  partition  the  nodes  of  a  graph  subject  to  the  constraints  given  in  (P3).  In 
this  case  one  is  maximizing  the  objective  function  in  (P3).  Clearly  problem 
(P3)  is  equivalent  to 

k     n-1    n 
max       III       (-djjfri/jy  (24) 

/=1    i=l    j=i+l 

k 
s.t.         I    iy   =   1,     ¥  i=l n  (25) 

j=l 

n 
(NP3)  I    iy  -    mj,    ¥  j-1 k  (26) 

H 

Xjj  -   0  or  1,     V  i-l,...,n  (27) 

V  j=l k. 
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Given  that  -djj  e  -D,  Barnes  (1982)  shows  that    problem  (NP3)  can  be 

approximated  by  the  linear  transportation  problem 


k      n 
I     I 

j-1    i=l  >/~m7 


max       I      I      uij   xij  (28^ 


k 
s.t.         I    Xjj-  1,     ¥  i=l n  (29) 

j=l 

(TP3)  n 

I    Xy-  ffij,    ¥j  =  l,...,k  (30) 

i=l 

Xy  >,  0  ,    v  i-l,...,n  (3D 

*  j=l k 


where  Aj  >/ ...  >,k  are  the  k  largest  eigenvalues  of  -D  (k  smallest  eigenvalues 
of  D)  and  Uj,  u2, ....  uk  are  the  corresponding  eigenvectors. 

The  linear  transportation    problem  can  be   solved  in  0(n3)   time 
(Lawler,  1976). 

4.3      A  Numerical  Example 

We  apply  the  approximation  algorithm  given  in  Section  4.2  on  the 
following  food  data  problem  given  in  Hartigan  (1975,  pp.88). 
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Table  5.  Clustering  Problem  from  Hartigan  (1975) 

Calcium 


1 

13 

21 

1 

2 

5 

36 

1 

3 

5 

37 

2 

4 

11 

29 

1 

5 

8 

30 

1 

6 

12 

27 

1 

7 

6 

31 

2 

8 

4 

29 

1 

Consider  the  distance  measure. 


dij  ■  II  *i "  ij  Il22 


where  a(  is  the  ith  food  type  row.  For  example, 


dl2  =  (13-5)2  + (21-36)2+  (1-D2  =  289. 


-23- 
The  8x8  distance  matrix  D  of  the  food  data  representation  of  Table  5  is 


0 

289 

321 

68 

106 

37 

150 

145 

289 

0 

2 

85 

45 

130 

27 

50 

321 

2 

0 

101 

59 

150 

37 

66 

68 

85 

101 

0 

10 

5 

30 

49 

106 

45 

59 

10 

0 

25 

6 

17 

37 

130 

150 

5 

25 

0 

53 

68 

150 

27 

37 

30 

6 

53 

0 

9 

145 

50 

66 

49 

17 

68 

9 

0 

(32) 


If  we  wish  to  find  two  optimal  clusters  of  D  where  each  group  has  four 
elements,  we  find  the  two  largest  eigenvalus  of  -D  which  are  488.4  and 
98.059  respectively.  The  corresponding  eigenvectors  are 

u,T=  [.655,  -.447,  -.495,  .1102,  -.0503,  .259,  -.1709,  -.124]* 


u2T  =  1.363,  .1658,  .338,  -.405,  -.4737,    .308,  -.3798,    -.315lT 
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The  transportation  problem  approximation  of  the  optimal  k-cluster  problem 
(P3)  is 


Mai    1/2    I      I     Uij  Xij 


2       8 

I      I 

j=l    i=l 

S.t.        X„    ♦   X|2    -    1 


x21  +   x22 
x81  +   x82  " 


=    1 


Xn  +    X2i  +    ...   +   X8I   -    4 


xi{  »  0 


The  solution  of  this  problem  is  to  group  elements  1,  4,  5,  and  6  in  one  cluster 
and  the  others  in  the  second  cluster.  The  resulting  clusters  are  obtained  by 
permuting  the  rows  and  columns  of  D  into 


D    - 


0 

10 

68 

5  1  30 

49 

85 

101 

10 

0 

106 

25  '  6 

17 

45 

59 

68 

106 

0 

37  .  150 

145 

289 

321 

5 

25 

37 

0  1  53 

h  — 

53  |   0 

68 

130 

150 

30 

6 

150 

9 

27 

~37 

49 

17 

145 

68  1   9 

0 

50 

66 

85 

45 

289 

130    27 

50 

0 

2 

101 

59 

321 

150!   37 

66 

2 

0 

Note  that  the  sum  of  the  intra-cluster  elements  in  D  is  small  in  this  case. 
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5.        CONCLUSIONS 

The  clustering  problem  has  been  of  interest  to  many  researchers 
working  in  different  areas.  In  this  paper,  an  attempt  has  been  made  to 
present  a  uniform  view  of  the  clustering  problem.  Two  popular 
representations  of  this  problem  are  matrix  models  and  graph  models.  In  the 
matrii  representation,  rows  are  rearranged  such  that  "similar"  rows  are 
adjacent  in  the  permuted  matrii.  In  clustering  problems  modelled  by 
graphs,  the  object  is  to  decompose  the  graph  into  sub-graphs  such  that  there 
are  minimal  interconnections  betweeen  the  sub -graphs. 

Three  distinct  integer  programming  formulations,  which  cover 
important  variations  of  these  two  representations  were  developed.  First,  we 
considered  the  problem  of  finding  natural  clusters.  The  problem  was  shown 
to  be  equivalent  to  two  travelling  salesman  problems,  which  can  be  solved 
by  efficient  heuristic  techniques.  Second,  a  clustering  problem  with  a  fixed 
number  of  clusters  was  formulated.  A  Lagrangian  relaxation  method  was 
developed  for  solving  this  problem.  An  efficient  subgradient  algorithm  was 
developed  and  was  shown  to  require  a  much  smaller  number  of  iterations 
than  the  Mulvey  and  Crowder  (1979)  algorithm.  Finally,  a  clustering 
problem  with  a  fixed  number  of  clusters  and  cluster  sizes  was  formulated. 
An  eigenvector  approach  led  to  an  approximation  of  the  original  problem  by 
a  linear  transportation  problem. 


-26- 
6.        ACKNOWLEDGEMENTS 

Research  of  the  first  author  (A.  Kusiak)  has  been  partially  supported 
by  the  Natural  Sciences  and  Engineering  Research  Council  of  Canada. 
Research  of  the  second  author  (A.  Vannelli)  has  been  partially  supported  by 
the  Natural  Sciences  and  Engineering  Research  Council  of  Canada  (Grant  No. 
U0381). 


-27- 

REFERENCES 


1 .  Arthanari,  I.S.  and  Y.  Dodge  (1981),  Mathematical  Programming  in 

Statistics.  Wiley,  New  York. 

2.  Barnes,  E.R.  ( 1 982),  An  algorithm  for  partitioning  the  nodes  of  a  graph, 

SI  AM  1.  of  Algebraic  and  Discrete  Methods.  Vol.  3,  pp.  541-550. 

3.  Bills,  G.W.  ( 1 970 ),  On-line  Stability  Analysis  Study,  Final  Report  for 

Project  RP90-1,  Edison  Electric  Institute. 

4.  Burbidge,  J.L.  ( 1 975 ),  The  Introduction  of  Group  Technology.  Halsted 

Press,  John  Wiley,  New  York,  (chapter  9). 

5.  Bhat.  M.V.  and  A.  Haupt  (1979).  An  efficient  clustering  algorithm.  IEEE 

Transactions  on  Systems.  Man  and  Cybernetics.  Vol.  SMC-6,  pp. 
61-64. 

6.  Everitt,  B.  ( 1 980),  Cluster  Analysis.  Halsted  Press.  New  York. 

7.  Green,  P.E.  ( 1 978 ).  Analyrinfl  Multivariate  Data.  Dry  den  Press, 

Hinsdale,  pp.  290-335. 

8.  Green,  P.E.,  R.E.  Frank,  and  J.  Robinson  (1967),  Cluster  analysis  in  test 

market  selection,  Management  Science.  Vol.  13,  pp.  387-400. 

9.  Kernighan,  B.W.  and  S.  Lin  (1970)  An  efficient  procedure  for 

partitioning  graphs,  Bell  Systems  Technical  journal. 
pp.  291-307. 

10.  King.  J.R.  (1980).  Machine-component  group  formation  in  production 

Flow  analysis:  An  approach  using  a  rank  order  clustering 
algorithm.  International  journal  of  Production  Research.  Vol.  18, 
pp.  213-232. 

1 1 .  Klastorin,  T.D.  ( 1 982),  An  alternative  method  for  hospital  partition 

determination  using  hierarchical  cluster  analysis,  Operations 
ResgaxcjL  Vol.  30,  pp.  1134-11 46. 


-28- 

12.      Kusiak,  A.  (1983),  Computer  aided  data  base  design,  Working  Paper 
No.  9/83,  Dept.  of  Industrial  Engineering,  Technical  University 
of  Nova  Scotia,  Halifax,  N.S. 

13-      Kusiak,  A.  (1985),  Clustering  problem:  Database  application, 

algorithms  and  computational  result,  Working  Paper  #84/02, 
Department  of  Mechanical  and  Industrial  Engineering. 
University  of  Manitoba  (submitted  for  publication  in  the 
European  Journal  of  Operational  Research). 

14.  Kusiak,  A.  (1985a),  Flexible  manufacturing  systems:  A  structural 

approach  International  journal  of  Production  Research 
(forthcoming). 

15.  Lawler.  E.L.  (1976).  Combinatorial  Optimization:  Networks  and 

Matroids.  Holt,  Rinehart  and  Winston,  New  York. 

16.  Lenstra,  J.K.  (1974),  Clustering  a  data  array  and  the  traveling  salesman 

problem,  Operations  Research.  Vol.  22,  pp.  413-414. 

17.  McCormick,  W.T.,  P.J.Schweitzer,  and  T.W.  White  (1972),  Problem 

decomposition  and  data  reorganization  by  clustering  technique, 
Operations  Research.  Vol.  20.  pp.  993-1009. 

18.  Motzkin,  T.  and  I.J.  Schoenberg  (1954).  The  relaxation  methods  for 

linear  inequalities,  Canadian  journal  of  Mathematics.  Vol.  6,  pp. 
393-404. 

19.  Mulvey,  J.M.  and  H.P.  Crowder  (1979),  Cluster  analysis:  An  application 

of  Lagrangean  relaxation,  Mana2ement  Science.  Vol.  25, 
pp.  329-340. 

20.  Nagori,  Y.,  S.  Tenda,  and  T.  Shinga  ( 1 980),  Determination  of  similar 

task  types  by  the  use  of  the  multidimensional  classification 
method:  towards  improving  quality  of  working  life  and  job 
satisfaction,  International  journal  of  Production  Research. 
Vol.  18,  pp.  307-322. 


-29- 

2 1 .  Rao,  G.R.  ( 1 977),  Ouster  analysis  applied  to  a  study  of  race  mature  in 

human  populations,  in:  J.V.  Ryzin,  Ed.,  Classification  and 
Clustering.  Academic  Press,  New  York. 

22.  Siljak,  D.D.  and  M.F.  Sezer  (1984),  Nested  decomposition  into  weakly 

coupled  components.  IFAC  9&  World  Congress.  Budapest. 
Hungary,  lulv  2-6.  1984. 

23.  Sokal,  R.R.  and  P.H.  Sneath  (1963),  Principles  of  Taionomv.  Freeman, 

London. 

24.  •  Stagg,  C.W.,  J.P.  Dopazo,  O.A.  Kilton  and  L.S.  VanSlyk  (1970), 

Techniques  for  real-time  monitoring  of  power  operating 
systems,  IEEE  Transactions  on  PAS.  Vol.  89,  pp.  545-555. 

25.  Tou.  J.T.  and  R.C.  Gonzalez  ( 1 974),  Pattern  Recognition  Principles. 

Addison- Wesley,  Reading,  Massachusetts. 

26.  Vannelli,  A.  (1985),  Approximating  a  class  of  graph  decomposition 

problems  by  linear  transportation  problems,  IBM  Research 
Report  RC  10584  (#47380),  Journal  of  Classification  (to  appear). 

27.  Ward,  J.H..  Jr.  ( 1 963).  Hierarchical  grouping  to  optimize  an  objective 

function,  Journal  of  American  Statistical  Association.  Vol.  58.  pp. 
236-244. 


HECKMAN 

BINDERY  INC. 

JUN95 

I  .,   m    p  N.  MANCHESTER 

|  &mr>J  -To  -Pleasf    [ND1  ANA  45952 


